Variable pointsize #251

teunbrand · 2024-01-17T20:21:51Z

This PR aims o fix #83.

Briefly, it takes Paul's advice here to cast everything to an absolute unit as soon as possible, and then refuse to deal with other types of units ever again (unless absolutely needed). This should make the segments attach nicely to the normally sized circle point (shapes 1, 16, 19, 21, but not 20). This is also stable upon resizing the device window.

Reprex from #83 (comment):

devtools::load_all("~/packages/ggrepel")
#> ℹ Loading ggrepel
#> Loading required package: ggplot2

set.seed(42)
d <- data.frame(
  x = c(1, 2, 2, 3),
  y = c(1, 2, 3, 1),
  pointsize = c(0, 2, 1, 0),
  label = sprintf("label%s", 1:4)
)

p <- ggplot(d, aes(x, y)) +
  geom_text_repel(
    min.segment.length = 0.1,
    box.padding = 0.2,
    aes(label = label, point.size = pointsize),
    size = 5, max.iter = 1e5, max.time = 1
  ) +
  continuous_scale(
    aesthetics = c("size", "point.size"), 
    palette = scales::area_pal(c(2, 50)), guide = "none"
  )
  
p + geom_point(aes(size = pointsize), color = "grey50")

Shape = 20 has a smaller diameter, so the segment doesn't attach for that shape.

p + geom_point(aes(size = pointsize), color = "grey50", shape = 20)

^{Created on 2024-01-17 with reprex v2.1.0}

I had to update 1 test that didn't draw any segments by default.
Visual tests needed updating due to tiny changes in location of some of the labels, usually something like 0.01 unit in a scale of 100s. I presume these are rounding errors and otherwise harmless.

Please have a go and play around a bit yourself as I hadn't considered every option under the sun for this.

teunbrand · 2024-01-17T20:23:01Z

R/geom-label-repel.R

+  # `.pt / .stroke` ~= 0.75 accounts for a historical error in ggplot2
+  point.size <- x$data$point.size
+  point.size[is.na(point.size)] <- 0
+  point.size <- point.size * .pt / .stroke / 20


I think this here was the magic incantation you were looking for in #83 (comment)

Amazing, I don't think I would have ever figured this out by myself. 🙏

You're a wizard 🧙‍♂️

slowkow · 2024-01-22T16:12:51Z

Some of the examples failed to build after these changes, so I had to do some fiddling with NA and converting to units.

After I fixed that, the forces seem to be a bit off now. Some examples still look good, but others have too little force so the labels are squished together or on top of each other.

Playing with units is harder than I thought. But I'll keep trying and update here when I feel like I have some good results, because I think your solution makes the segments the perfect length.

teunbrand · 2024-01-22T18:43:05Z

Thanks for the heads up and trying out various options! Could you point me to a specific example that would be broken with this PR? I could try and have a go at preventing any serious damage.

slowkow · 2024-01-23T20:48:38Z

The variable point size example is perfect! it's incredibly satisfying to see the segment lengths perfectly calculated. 💯

But there are a few examples where my arbitrary choice of force magnitude is now a bit off after switching the unit of measurement.

Example 1

Here is one example that used to look good, but now it is squished:

set.seed(42)
ggplot(mtcars, aes(x = wt, y = 1, label = rownames(mtcars))) +
  geom_point(color = "red") +
  geom_text_repel(
    force_pull   = 0, # do not pull toward data points
    nudge_y      = 0.05,
    direction    = "x",
    angle        = 90,
    hjust        = 0,
    segment.size = 0.2,
    max.iter = 1e4, max.time = 1
  ) +
  xlim(1, 6) +
  ylim(1, 0.8) +
  theme(
    axis.line.y  = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.y  = element_blank(),
    axis.title.y = element_blank()
  )

Example 2

Here is another one where the result looks worse than it did before, maybe the number of iterations might be too low with the new units:

set.seed(42)
mtcars$label <- rownames(mtcars)
mtcars$label[mtcars$mpg < 25] <- ""
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), label = label)) +
  coord_polar(theta = "x") +
  geom_point(size = 2) +
  scale_color_discrete(name = "cyl") +
  geom_text_repel(show.legend = FALSE) + # Don't display "a" in the legend.
  theme_bw(base_size = 18)

Possible approaches

I can think of two approaches to try to address this. The first should probably be trivial to implement (but I failed, so maybe not). The second is a lot more work but probably more robust. There are probably more approaches that I have not yet considered.

Approach 1

The first approach is to figure out what scaling factors to change. For example, I tried fiddling with these two numbers:

ggrepel/R/geom-text-repel.R

Lines 438 to 439 in 1144585

    
           force_push      = x$force * 1e-6, 
        
           force_pull      = x$force_pull * 1e-2,

After some playing, I could not perfectly restore the old behavior in ggrepel 0.9.4. I tried changing 1e-6 to 1e-5 and 1e-4, and I think 1e-4 was pretty good, but still a bit off.

Maybe a better approach is to multiply all coordinates by some scaling factor, and leave these two numbers as-is? I don't know if that will work.

Maybe there are smarter ways of choosing values for force magnitudes? I don't know, but I bet someone has done this in the past.

Approach 2

The second approach is to modify the positioning algorithm so it is more robust. Some time after I figured out how to use a physical repulsion/attraction simulation to position labels, I found a totally different approach in d3 voronoi-labels. I like the Voronoi algorithm, because it is simpler and does not depend on arbitrary numbers for force calculations. I wish I could go back in time and do it that way from the start, but this has been a learning journey for me, so ggrepel has a physical simulation instead.

Some day, I might consider changing to a Voronoi algorithm in a new version of ggrepel. It may be possible to keep the old simulation algorithm around for backwards-compatibility if people like the way it looks. But a newer algorithm using Voronoi calculations will probably be more efficient and robust. Perhaps some physical simulation fine-touches can improve label placement after the initial Voronoi placement, and I bet the arbitrary choices of force magnitudes will be less important when we have nice initial positions.

Also, I realize this PR is supposed to be about segment length, so I understand that the Voronoi idea is a bit much — I just wanted to share the ideas bouncing around in my head. Feel free to ignore! Just a thought.

teunbrand · 2024-01-24T20:22:36Z

Thanks for the examples Kamil. I agree with you that these look worse than before.

Knowing nothing about the internals of the attraction/repulsion simulation underneath, I think the correct multiplier should be the width or height, as everything is now scales from [0, 1] to [0, width] and [0, height]?

Maybe a better approach is to multiply all coordinates by some scaling factor, and leave these two numbers as-is? I don't know if that will work.

In theory you should be able to divide all x's by width and y's by height and you should have the calculation similar to before when everything was in native units. You'd also have to re-multiply after the repelling algorithm I suppose.

The voronoi algorithm sounds nice, but I think rather than it being the algorithm for label positioning, I think it is a good initialiser (instead of random jitter, I suppose). For example in those big voronoi cells it doesn't make sense to place the label so far away when there is plenty of space near the point itself.

EDIT: Totally read over the remark suggesting something similar 😅

Perhaps some physical simulation fine-touches can improve label placement after the initial Voronoi placement,

slowkow · 2024-01-24T21:12:20Z

If you want to give the scaling a shot, I'd appreciate it! 😃

Otherwise, I will try to come back to this sometime in the future and see if I can get the scaling close enough so that people don't complain about any changes in performance. I feel like I really messed up by choosing "native" without understanding what it meant at the time. My current understanding is that "native" forces width and height to both be in the range 0-1 instead of allowing width range be different than height range.

Anyway, thank you so much for the discussion and the help! There's always so much to learn.

This reverts commit 1068154.

This reverts commit 52885b5.

This reverts commit bffdd56.

teunbrand · 2024-12-02T19:51:52Z

Thanks for the discussion here Kamil! I'm closing this PR in favour of #265.

teunbrand added 7 commits January 17, 2024 20:58

import some extra grid functions

bda20bf

utilities for converting to centimetres

6ed31a5

convert everything to cm at the beginning

3304cc5

rest of calculations in centimetres

3accc4a

regenerate namespace

5d05f28

use min.segment.length in test

df78d74

accept rounding errors in visual tests

bc3a004

teunbrand commented Jan 17, 2024

View reviewed changes

slowkow added 2 commits January 18, 2024 15:08

add Teun as a contributor

496fdd9

Merge branch 'master' into variable_pointsize

16e3493

guard against NA point padding

52885b5

teunbrand added 5 commits January 24, 2024 22:32

run repel algo in 0-1 scale

bffdd56

accept snapshots

1068154

Revert "accept snapshots"

56a739b

This reverts commit 1068154.

Revert "guard against NA point padding"

943f099

This reverts commit 52885b5.

Revert "run repel algo in 0-1 scale"

2cb455e

This reverts commit bffdd56.

teunbrand mentioned this pull request Dec 2, 2024

improve point size calculations #265

Open

teunbrand closed this Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable pointsize #251

Variable pointsize #251

teunbrand commented Jan 17, 2024 •

edited

Loading

teunbrand Jan 17, 2024

slowkow Jan 18, 2024

slowkow commented Jan 22, 2024

teunbrand commented Jan 22, 2024

slowkow commented Jan 23, 2024

teunbrand commented Jan 24, 2024 •

edited

Loading

slowkow commented Jan 24, 2024

teunbrand commented Dec 2, 2024

Variable pointsize #251

Variable pointsize #251

Conversation

teunbrand commented Jan 17, 2024 • edited Loading

teunbrand Jan 17, 2024

Choose a reason for hiding this comment

slowkow Jan 18, 2024

Choose a reason for hiding this comment

slowkow commented Jan 22, 2024

teunbrand commented Jan 22, 2024

slowkow commented Jan 23, 2024

Example 1

Example 2

Possible approaches

Approach 1

Approach 2

teunbrand commented Jan 24, 2024 • edited Loading

slowkow commented Jan 24, 2024

teunbrand commented Dec 2, 2024

teunbrand commented Jan 17, 2024 •

edited

Loading

teunbrand commented Jan 24, 2024 •

edited

Loading